A More Cohesive Summarizer

نویسندگان

  • Christian Smith
  • Henrik Danielsson
  • Arne Jönsson
چکیده

We have developed a cohesive extraction based single document summarizer (COHSUM) based on coreference links in a document. The sentences providing the most references to other sentences and that other sentences are referring to, are considered the most important and are therefore extracted. Additionally, before evaluations of summary quality, a corpus analysis was performed on the original documents in the dataset in order to investigate the distribution of coreferences. The quality of the summaries is evaluated in terms of content coverage and cohesion. Content coverage is measured by comparing the summaries to manually created gold standards and cohesion is measured by calculating the amount of broken and intact coreferences in the summary compared to the original texts. The summarizer is compared to the summarizers from DUC 2002 and a baseline consisting of the first 100 words. The results show that COHSUM, aimed only at maintaining a cohesive text, performed better regarding text cohesion compared to the other summarizers and on par with the other summarizers and the baseline regarding content coverage.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The effects of analysing cohesion on document summarisation

We argue that in general, the analysis of lexical cohesion factors in a document can drive a summarizer, as well as enable other content characterization tasks. More narrowly, this paper focuses on how one particular cohesion factor—simple lexical repetition—can enhance an existing sentence extraction summarizer, by enabling strategies for overcoming some particularly jarring enduser effects in...

متن کامل

Lexical cohesion, discourse segmentation and document summarization

Summaries automatically derived by sentence extraction are known to exhibit some coherence degradation, readability deterioration, and topical under-representation. We propose a strategy for improving upon these problems, aiming to generate more cohesive summaries by analyzing the lexical cohesion factors in the source document texts. As an initial experiment, we have looked at one particular f...

متن کامل

Cohesion and coherence for Automatic Summarization

This paper presents the integration of cohesive properties of text with coherence relations, to obtain an adequate representation of text for automatic summarization. A summarizer based on Lexical Chains is enchanced with rhetorical and argumentative structure obtained via Discourse Markers. When evaluated with newspaper corpus, this integration yields only slight improvement in the resulting s...

متن کامل

Integrating cohesion and coherence for Automatic Summarization

This paper presents the integration of cohesive properties of text with coherence relations, to obtain an adequate representation of text for automatic summarization. A summarizer based on Lexical Chains is enchanced with rhetorical and argumentative structure obtained via Discourse Markers. When evaluated with newspaper corpus, this integration yields only slight improvement in the resulting s...

متن کامل

Enhancing extraction based summarization with outside word space

We present results from improving vector space based extraction summarizers. The summarizer uses Random Indexing and Page Rank to extract those sentences whose importance are ranked highest for a document, based on vector similarity. Originally the summarizer used only word vectors based on the words in the document to be summarized. By using a larger word space model the performance of the sum...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012